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Abstract 

The Tsallis entropy is shown to be an additive entropy of degree-q that 
information scientists have been using for almost forty years. Neither is 
it a unique solution to the nonadditive functional equation from which 
random entropies are derived. Notions of additivity, extensivity and ho- 
mogeneity are clarified. The relation between mean code lengths in coding 
theory and various expressions for average entropies is discussed. 

1 The 'Tsallis' Entropy 

In 1988 Tsallis 1 published a much quoted paper containing an expression 
for the entropy which differed from the usual one used in statistical mechan- 
ics. Previous to this, the Renyi entropy was used as an interpolation formula 
that connected the Hartley-Boltzmann entropy to the Shannon-Gibbs entropy. 
Notwithstanding the fact that the Renyi entropy is additive, it lacks many 
other properties that characterize the Shannon-Gibbs entropy. For example, 
the Renyi entropy is not subadditive, recursive, nor does it possess the branch- 
ing and sum properties The so-called Tsallis entropy fills this gap, while 
being nonadditive, it has many other properties that resemble the Shannon- 
Gibbs entropy. It is no wonder then that this entropy fills an important gap. 

Yet, it appears odd, to say the least, that information scientists have left 
such a gaping void in their analysis of entropy functions. A closer analysis of 
the literature reveals that this is not the case and, indeed, a normalized Tsallis 
entropy seems to have first appeared in a 1967 paper by Havrda and Charvat 
P] who introduced the normalized 'Tsallis' entropy 



^„,,(pi,...,p„)= Vpf-1 / (21-^-1) (1) 




for a complete set of probabilities, pi, i.e. Y^'^^iPi — 1, and parameter q > 0, 
but q ^ I. The latter requirement is necessary in order that possess the 
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fundamental property of the entropy; that is, it is a concave function. According 
to Tsalhs HI, only for q > is the entropy, said to be expansible P [cf. © 
below] . 

2 Properties of Additive Entropy of Degree-g 

The properties used to characterize the entropy are 

1. Concavity 

Cn \ Ti 

PiXi > ^ PiSn,q{Xi) (2) 
i=l / i=l 

where the nonnegative n-tuple, (p) = (pi, . . . ,p„), forms a complete prob- 
ability distribution. For ordinary means, the n-tuple, (x) = (xi, . . . ,Xn), 
represents a set of nonnegative numbers which constitute a set indepen- 
dent variables. What constitutes the main difficulty in proving theorems 
on characterizing entropy functions in information theory is that the 'in- 
dependent variables', (x), are not independent of their 'weights', (p) j^. 

Coding theory, to be discussed in the next section, derives the functional 
dependencies in a very elegant way through optimization. The entropies 
S{xi) represent the costs of encoding a sequence of lengths Xi, whose 
probabilities are pi . Minimizing the mean length associated with the cost 
function, expressed as the weighted mean of the cost function, gives the 
optimal codeword lengths Xi as functions of their probabilities, pi. Con- 
sequently, the entropies that result when the evaluated at their 
optimal values by expressing them in terms of their probabilities, pi, con- 
stitute lower bounds to the mean lengths for the cost function. 

2. Non-negativity 

Sn^qipi,. . . ,Pn) > (3) 

3. Symmetry 

Sn,qiPl, ■ ■ ■ ,Pn) = S'„,g(p[i], . . . ,P[„]) (4) 

where [] denotes any arbitrary permutation of the indices on the proba- 
bilities. For the entropy, the symmetry property means that it should 
not depend upon the order in which the outcomes are labelled. 

4. The sum property 

n 

Sn,q{Pl,---,Pn) ^^&n,q{Pi) (5) 
1=1 

where &n.q is a measurable function on ]0, 1[. 
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5. Expansibility 



<S'„+l,g(0,pi, . . . ,p„) = Sn,q{pi, ■ ■ ■ ,Pn), (6) 

meaning that the entropy should not change when an outcome of proba- 
bility zero is added. 

6. Recursivity of degree-^/ 

Pi P2 



Sn,q{Pl,- ..,Pn)= Sn-l.qiPl+P2, P3, ■ ■ ■ , Pn) + {Pl+P2)'^ S2.q 

\Pl+P2 P1+P2 

(7) 

asserting that if a choice is split into two successive choices, the original 
entropy will be the weighted sum of the individual entropies. Recursivity 
implies the branching property by requiring at the same time the additivity 
of the entropy as well as the weighting of the different entropies by their 
corresponding probabilities [Jj. 

7. Normality 

S2,qC^,^,) = l (8) 

8. Decisivity 

^2,,(1,0) = 52^,(0, 1)=0 (9) 

9. Additivity of degree-q 

Snm,q{Piqi, ■ ■ ■ ,Pl9m, ■ • • ,Pnqrn) = Sn,q{pi, ■ ■ ■ ,Pn) (10) 
+ (2^"« - l)Sn,q{pi, ■ ■ ■ ,Pn)Srn^q{qi, . . . , Qm) 

for any two complete sets of probabilities, {p) and {q). As late as 1999, 
Tsallis 0] refers to (|1(J|I as exhibiting "a property which has apparently 
never been focused before, and which we shall refer to as the composability 
property." Here, composability means something different than in infor- 
mation theory j2j , in that it "concerns the nontrivial fact that the entropy 
S{A + B) of a system composed of two independent subsystems A and B 
can be calculated from the entropies S{A) and S{B) of the subsystems, 
without any need of the microscopic knowledge about A and B, other than 
the knowledge of some generic universality class, herein the nonextensive 
universality class, represented by the entropic index q . . 4 . 

However, the additive entropy of degree-g, Q), is not the only solution to 
the functional equation pO|l for q 1. The average entropy 




S^^^ip,,...,p^)^-^il-l^pU (11) 



also satisfies (|10|l . with the only difference that (1 — q)/q replaces the 
coefficient in the multiplicative term [Hj. Since the weighted mean of 
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degree-g is homogeneous, the pseudo-additive entropy l(TT)) is a first-order 
homogeneous function of (p), S^g{Xpi, . . . , Xpn) = XS^,j{pi, . . . ,p„). It 
can be derived by averaging the same solution to the functional equation 
(fTUIl . in the case g 7^ 1, as that used to derive the Tsallis entropy, except 
with a different exponent and normalizing factor, under the constraint 
that the probability distribution is complete [5]. Although the pseudo- 
additive entropy (fTT|l lacks the property of recursivity, JT)), it is monotonic, 
continuous, and concave for all positive values of q. Weighted means have 
been shown to be measures of the extent of a distribution ^Uj) and Hll|l 
relates the entropy to the weighted mean rather than to the more familiar 
logarithm of the weighted mean, as in the case of the Shannon and Renyi 
entropies. 

Tsallis, in fact, associates additivity with extensivity in the sense that for 
independent subsystems 

Snm,qiPiqi, ■ ■ ■ , Pnqm) = <S'„,g(pi, . . . , Pn) + Sm,q{qi, ■ ■ ■ , q-m) (12) 

According to Tsallis 4 , superadditivity , q < 1, would correspond to su- 
perextensivity , and subadditivity , q > 1, would correspond to subexten- 
sivity. According to Callen lllj . extensive parameters have values in a 
composite system that are equal to the sum of the values in each of the 
systems. Anything that is not extensive is labelled intensive^ although 
Tsallis would not agree [cf . (|30|l below] . For instance if we consider black- 
body radiation in a cavity of volume V , having an internal energy, f7, and 
magnify it A times, the resulting entropy 



will be A times the original entropy, S{U,V), where a is the Stefan- 
Boltzmann constant. Whereas extensitivity involves magnifying all the 
extensive variables by the same proportion, additivity in the sense of being 
superadditive or subadditive deals with a subclass of extensive variables, 
because the condition of extensivity of the entropy imposes that the de- 
terminant formed from the second derivatives of the entropy vanish |12| . 
The entropy of black-body radiation, l(T^ . is extensive yet it is subaddi- 
tive in either of the extensive variables. The property of subadditivity is 
what Lorentz used to show how interactions lead to a continual increase 
in entropy |12|. This is a simple consequence of Minkowski's inequality. 



where u = U/V is the energy density. Hence, (sub-or super-) extensivity 
is something very different from (sub-or super-) additivity. 

10. Strong additivity of degree-g 



XS{U,V) = |(t1/4(AC/)3/4(AF)1/4, 



(13) 



3/4 , 3/4 ^ ^ , n3/4 
U{ + > [Ui + U2j ' , 



Smn,q{Piqilj ■ ■ ■ j Pn^lm: ■ ■ ■ i 



Pnqnm) = Sn^q{pi, ■ ■ ■ , Pn) 



n 



+ 




(14) 



i=i 
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where qij is the conditional probability. Strong additivity of degree-g 
describes the situation in which the sets of outcomes of two experiments 
are not independent. Additivity of degree-q, H1U|) . follows from strong 
additivity by setting qik = <l2k = ■ ■ ■ = q,nk ~ Qk, and taking ^ into 
consideration [2]. 

A doubly stochastic matrix (qij), where to = rt, is used in majorization 
to distribute things, like income, more evenly 13 , and this leads to an 
increase in entropy. For if 

n 

Qj^^lijP^^ (15) 

and 

n n n n 

j=i i=i j=i 1=1 

it follows from the convexity oi ip — xhix, or 



that 



<S'„,i(gi, . . . ,g„) = -^Qilnqi > -'^^q^jpilnpi (16) 

z— 1 i— 1 J — 1 

since J2^=i lij — 1- say that p majorizes q, p y q if and only if 

((T31) for some doubly stochastic matrix (g^) ^1]. A more even spread of 
incomes increases the entropy. Here we are at the limits of equilibrium 
thermodynamics because we are invoking a mechanism for the increase 
in entropy, which in the case of incomes means taking from the rich and 
giving to the poor |^. This restricts q in the 'TsaUis' entropy to ]0, 1[. 
Values of q in ]1,2[ show an opposing tendency of balayage or sweeping 
out JSj. Whereas averaging tends to decrease inequality, balayage tends 
to increase it 

Yet Tsallis ^ refers to processes with q < 1, i.e. pj > Pi, as rare events, 
and to g > 1, i.e. pf < pi as frequent events. However, only in the case 
where q < 1 will the Shannon entropy, 116() be a lower bound to other 
entropies like, the Renyi entropy 

which is the negative logarithm of the weighted mean of pf ^^ ■ The Renyi 
entropy has the attributes of reducing to the Shannon-Gibbs entropy, (|16() , 
in the limit as g — > 1, and to the Hartley-Boltzmann, entropy 

5„,o(l/n,...,l/n) -Inn (18) 
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in the case of equal a priori probabilities pi — 1/n. This leads to the 
property of 



11. n- maximal 



Sn,q{Pl, ■■■,Pn) < Sn,q (-,...-) (19) 



n n 



for any given integer n > 2. The right-hand side of (|19|l should be a 
monotonia increasing function of n. As we have seen, the tendency of the 
entropy to increase as the distribution becomes more uniform is due to 
the property of concavity (j^J. Hence, it would appear that processes with 
q < 1 would be compatible with the second law of thermodynamics, rather 
than being rare exceptions to it! 

12. Continuity: The entropy is a continuous function of its n variables. Small 
changes in the probability cause correspondingly small changes in the en- 
tropy. Additive entropies of degree-g are small for small probabilities, 

lim ^2 n (p) — lim ^ ^ E1 



3 Coding Theory and Entropy Functions 

The analogy between coding theory and entropy functions has long been known 
|18|. If ki, . . . , kn are the lengths of codewords of a uniquely decipherable code 
with D symbols then the average codeword length 

n 

(20) 

is bounded from below by the Shannon-Gibbs entropy l|16|) if the logarithm is 
to the base D. The optimal codeword length is ki = — Inp^, which represents 
the information content in event Ei. If D = ^ then pi — i contains exactly one 
bit of information. 

Ordinarily, one tries to keep the average codeword length fiOfl small, but it 
cannot be made smaller than the Shannon-Gibbs entropy. An economical code 
has frequently occurring messages with large pi and small ki. Rare messages are 
those with small pi and large ki. The solution rii = — Inpi has the disadvantage 
that the codeword length is very great if the probability of the symbol is very 
small. A better measure of the codeword length would be 

^log(j2p.D^''^ (21) 

where r = (I — q)/q, thereby limiting q to the interval [0, 1]. As r — > oo, the 
limit of 1)21(1 is the largest of the ki, independent of pi. Therefore, if q is small 
enough, or t large enough, the very large /c^'s will contribute very strongly to 
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the average codeword length (|21|l . thus keeping it from being small even for very 
small Pi. The optimal codeword length is now 



i=l 

showing that the Renyi entropy is the lower bound to the average codeword 
length (|21|) 18 . Just as the pi = D~'^^ are the optimum probabilities for the 
Shannon-Gibbs entropy, the optimum probabilities for the Renyi entropy are 
the so-called escort probabilities, 

D-'' - (22) 

As Pi —> 0, the optimum value of fc; is asymptotic to —qlnpi so that the optimum 
length is less than —Inpi for q < 1 and sufficiently small pi. This provides 
additional support for keeping q within the interval [0, 1] |16| . 

Although the Renyi entropy is additive it does not have other properties 
listed above; for instance, it is not recursive and does not have the branching 
property nor the sum property. It is precisely the 'Tsallis' entropy which fills 
the gap, while not being additive, it has many of the other properties that 
an entropy should have |19| . Therefore, in many ways the additive entropy of 
degree-g Q is closer to the Shannon entropy, (fT?)|l than the Renyi entropy is. 
The so-called additive entropies of degree-g can be written as 

. . . ,p„) = + . . . +k)V ( ) , (23) 

^ \pi + ...+p^J 

where the function / is a solution to the functional equation 

fix) + (1 - x)V (y^) = /(y) + (1 - yr.f 

subject to /(O) = /(I), which was rederived by Curado and Tsallis 20 , and the 
property of additivity of degree-g (|10|l was referred to them as pseudo-additivity , 
omitting the original references. What these authors appeared to have missed 
are the properties of strong additivity, H14|) and recursivity of degree-g Q. These 
properties can be proven by direct calculation using the normalized additive 
entropy of degree-g, Additive entropies of degree-g > 1 are also subadditive. 
Moreover, additive entropies of degree-q satisfy the sum property, lO where 

6,(p.)-(p?-P.)/(2^-'-l)>0. (24) 

Only for q > will (|24|l . and consequently J^l, be concave since 

e;'fe) = 'z('?-i)prV(2^-'-i)<o, 
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where the prime stands for differentiation with respect to pi. This is contrary 
to the claim that the additive entropy of degree-g is "extremized for all values 
of g" pp . It can easily be shown that the concavity property 

Cn \ n 

i=l / 2=1 

implies the monotonic increase in the entropy (|19() . Setting pi = 1/n and using 
the sum property ((SJ lead to 

Sn,q{Pl, ■ ■ ■ ,Pn) = ^ &q{Pi) < n6q^ (^^^ = uGq (^j = Sn,q • • • , ^ 



i=l 



showing that S'„^g(l/n, . . . , 1/n) is maximal. 

In order to obtain explicit expressions for the probabilities, Tsallis and col- 
laborators maximized their non-normalized entropy 

SlqiPi ...,Pn)^(j2pl-l^ /(I - q) (25) 

with respect to certain constraints. Taking their cue from Jaynes' |21| formalism 
of maximum entropy, (|25l) was to be maximized with respect to the finite norm 



p{x) dx — 1 

and the so-called q average of the second moment [50] 

/oo 
x'[ap{xWd{x/a)^<j'. (26) 
-oc 

The latter condition was introduced because the variance of the distribution did 
not exist, and the weights, (p*), have been referred to as 'escort' probabilities 
[cf. H22I) above]. The resulting distribution is almost identical to Student's 
distribution 



fjl iq~l) r(l/(g-l)) , (g-1) ,\-'^^'-'^ 

^(^1^) = vw3-.)r((3-,)/2(,-i)) V + w^r ) 



(27) 



where {3 — q)/{q — l) is the number of degrees of freedom, and /i is the Lagrange 
multiplier for the constraint (|26|l |23|. 

The Gaussian distribution is the only stable law with a finite variance, all 
the other stable laws have infinite variance. These stable laws have much larger 
tails than the normal law which is responsible for the infinite nature of their 
variances. Their initial distributions are given by the intensity of small jumps, 
where the intensity of jumps having the same sign of x, and greater than x in 
absolute value is 

Fix) = 4' (28) 
xP 
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ioT X > 1. For /3 < 1, the generalized random process, which is of a Poisson 
nature, produces only positive jumps, whose intensity (|28l) is always increasing. 
No moments exist, and the fact that 

Z(A) = e-^' (29) 

where A is both positive and real, follows directly from Polya's theorem: If for 
each A, Z{0) — 1, Z{X) > 0, 2(A) = Z(— A), Z{X) is decreasing and continu- 
ous convex on the right half interval, then Z{X) is a generating function |25|. 
Convexity is easily checked for < /3 < 1, and it is concluded that z(A) is a 
generating function. In other words, 

/•oo 

1 - Z(A) = - / (1 - e^^^) dF{x) = r(l - /3)A-'3 
Jo 

exists for a positive argument of the Gamma function, and that implies (3 < 1. 

This does not hold on the interval 1 < (3 < 2, where it makes sense to talk 
about a compensated sum of jumps, since a finite mean exists. In the limit 
(3 = 2, positive and negative jumps about the mean value become equally as 
probable and the Wiener-Levy process results, which is the normal limit. If 
one introduces a centering term in the expression. Ax, the same expression for 
the generating function, (jSHl, is obtained to lowest power in A, as A ^ and 
X — > cx), such that their product is finite. 

These stable distributions, < /? < 1, (and quasi-stable ones, 1 < /? < 2, 
because the effect of partial compensation of jumps introduces an arbitrary addi- 
tive constant) are related to the process of super-diffusion, where the asymptotic 
behavior of the generalized Poisson process has independent increments with in- 
tensity For strictly stable processes, the super-diffusion packet spreads out 
faster than the packet of freely moving particles, while a quasi-stable distribu- 
tion describes the random walk of a particle with a finite mean velocity. It was 
hoped that these tail distributions could be described by an additive entropy 
of degree-g, where the degree of additivity would be related to the exponent 
of the stable, or quasi-stable, distribution. Following the lead of maximum en- 
tropy, where the optimal distribution results from maximizing the entropy with 
all that is known about the system, the same would hold true for maximizing 
the additive entropy of degree-g. However, it was immediately realized that the 
variance of the distribution does not exist. 

Comparing the derivative of the tail density (|28|l with H27(l identifies (3 = 
{3 — q)/{q — 1), requiring the stable laws to fall in the domain | < (7 < 3 j22) . 
However, it is precisely in the case in which we are ignorant of the variance 
that the Student distribution is used to replace the normal since it has much 
fatter tails and only approaches the latter as the number of degrees of freedom 
increases without limit [23 ■ Just as the ratio of the difference of the mean 
of a sample and the mean of the distribution to the standard deviation is dis- 
tributed normally, the replacement of the standard deviation by its estimator is 
distributed according to the Student's distribution. This distribution (|?7jl was 
not to be unexpected, because it stands in the same relation to the normal law 
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as the 'Tsallis' entropy, H25|l . in the hmit as the number of degrees of freedom 
is allowed to increase without limit. 
Whereas weighted means of order-g 

do have physical relevance for different values of q, the so-called ^-expectation 

En q 

has no physical significance for values of q =/= 1. Since the connection between 
statistical mechanics and thermodynamics lies in the association of average val- 
ues with thermodynamic variables, the g-expectations would lead to incorrect 
averages. This explains why for Tsallis the internal energy of a composite sys- 
tem is not the same as the internal energies of the subsystems, and makes the 
question "if we are willing to consider the nonadditivity of the entropy, why it is 
so strange to accept the same for the energy?" completely meaningless. Yet, 
the zeroth law of thermodynamics, and the derivation of the Tsallis nonintensive 
inverse temperature, 

P = ^/[i-i^-q)Sn.q], (30) 

where Uq is the q-expectation of the internal energy, rest on the fact that the 
total energy of the composite system is conserved |27| . 

It is as incorrect to speak of 'Tsallis' statistics as it would be to talk of 
Renyi statistics. These expressions are mere interpolation formulas leading to 
statistically meaningful expressions for the entropy in certain well-defined limits. 
Whereas for the Renyi entropy the limits q —>■ 1 and q ~> give the Shannon- 
Gibbs and Hartley-Boltzmann entropies, respectively, without assuming equal 
probabilities, the additive entropy of degree-g reduces to the Shannon entropy in 
the limit as g — > 1, but it must further be assumed that the a priori probabilities 
are equal in order to reduce it to the Hartley-Boltzmann entropy. Hence, only 
the Renyi entropies are true interpolation formulas. 

Either the average of — Inpi leading to the Shannon entropy, or the negative 
of the weighted average of p1~^ , resulting in the Renyi entropy will give the 
property of additivity [21 ■ Whereas the Shannon entropy is the negative of the 
logarithm of the geometric mean of the probabilities, 

S'„,i(pi, . . . ,pn) = -ln©„(pi, . . . ,p„), 

where 

&nipi...,Pn)^Tl?=^pT 

is the geometric mean, the Renyi entropy is the negative of the logarithm of the 
weighted mean 

S^^q^-lnMq-i, 
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where 



) 



1/(9-1) 



is the weighted mean of pf 



If the logarithm is to the base 2, the additive 



entropies of degree-q are exponentially related to the Renyi entropies of order-g 



which make it apparent that they cannot be additive. But nonadditivity has 
nothing to do with nonextensivity. 

As a concluding remark it may be of interest to note that undoubtedly the 
oldest expression for an additive entropy of degree-2 was introduced by Gini 
in 1912, who used it as an index of diversity or inequality. Moreover, 
generalizations of additive entropies of degree-g are well-known. It has been 
claimed that "Tsallis changed the mathematical form of the definition of entropy 
and introduced a new parameter g" |3U| . Generalizations that introduce additive 
entropies of degree-g + — 1 |ST] 



with n + I parameters, should give even better results when it comes to curve 
fitting. 
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